Designing q-Unique DNA Sequences with Integer Linear Programs and Euler Tours in De Bruijn Graphs

نویسندگان

  • Marianna D'Addario
  • Nils Kriege
  • Sven Rahmann
چکیده

DNA nanoarchitechtures require carefully designed oligonucleotides with certain non-hybridization guarantees, which can be formalized as the q-uniqueness property on the sequence level. We study the optimization problem of finding a longest q-unique DNA sequence. We first present a convenient formulation as an integer linear program on the underlying De Bruijn graph that allows to flexibly incorporate a variety of constraints; solution times for practically relevant values of q are short. We then provide additional insights into the problem structure using the quotient graph of the De Bruijn graph with respect to the equivalence relation of reverse complementarity. Specifically, for odd q the quotient graph is Eulerian, and finding a longest q-unique sequence is equivalent to finding an Euler tour, hence solved in linear time (with respect to the output string length). For even q, selfcomplementary edges complicate the problem, and the graph has to be Eulerized by deleting a minimum number of edges. Two sub-cases arise, for one of which we present a complete solution, while the other one remains open.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Number of Euler Tours of Random Directed Graphs

In this paper we obtain the expectation and variance of the number of Euler tours of a random Eulerian directed graph with fixed out-degree sequence. We use this to obtain the asymptotic distribution of the number of Euler tours of a random d-in/d-out graph and prove a concentration result. We are then able to show that a very simple approach for uniform sampling or approximately counting Euler...

متن کامل

The number of Euler tours of a random directed graph

In this paper we obtain the expectation and variance of the number of Euler tours of a random Eulerian directed graph with fixed out-degree sequence. We use this to obtain the asymptotic distribution of the number of Euler tours of a random d-in/d-out graph and prove a concentration result. We are then able to show that a very simple approach for uniform sampling or approximately counting Euler...

متن کامل

Reciprocals of Binary Power Series

If A is a set of nonnegative integers containing 0, then there is a unique nonempty set B of nonnegative integers such that every positive integer can be written in the form a + b, where a ∈ A and b ∈ B, in an even number of ways. We compute the natural density of B for several specific sets A, including the Prouhet-Thue-Morse sequence, {0} ∪ {2 : n ∈ N}, and random sets, and we also study the ...

متن کامل

The number of Euler tours of a random d-in/d-out graph

In this paper we obtain the expectation and variance of the number of Euler tours of a random d-in/d-out directed graph, for d ¥ 2. We use this to obtain the asymptotic distribution and prove a concentration result. We are then able to show that a very simple approach for uniform sampling or approximately counting Euler tours yields algorithms running in expected polynomial time for almost ever...

متن کامل

Reducing Genome Assembly Complexity with Optical Maps Final Report

The goal of genome assembly is to reconstruct contiguous portions of a genome (known as contigs) given short reads of DNA sequence obtained in a sequencing experiment. De Bruijn graphs are constructed by finding overlaps of length k − 2 between all substrings of length k − 1 from reads of at least k bases, resulting in a graph where the correct reconstruction of the genome is given by one of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012